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a continuous distribution F, and we prove a central limit theorem for the 
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1. Introduction 

In a typical on-line selection problem, a decision maker is presented with n 
random values in sequence and must decide whether to accept or reject each newly 
presented value. In the most famous such problem, the decision maker gets to make 
only a single choice, and the goal is to maximize the probability that the selected 
value is the best out of all n values. A problem of this kind was considered by 
Cayley (1875), but the modern developments begin in the 1960's with notable work 
by Lindley (1961) and Dynkin (1963). Samuels (1991) gives a thoughtful survey of 
much of the earlier literature, and connections to more recent work are given by 
Kricger and Samucl-Cahn (2009), Buchbinder, Jain and Singh (2010), and Bateni, 
Hajiaghayi and Zadimoghaddam (2010). 

In some more combinatorial problems, the decision maker makes multiple se- 
quential selections from the sequence of presented values, and the objective is to 
maximize the expected number of selected elements, subject to appropriate com- 
binatorial constraints. For example, one can consider the optimal sequential se- 
lection of a monotone increasing subsequence. This on-line selection problem was 
first studied in Samuels and Steele (1981), and it has been analyzed more recently 
in Gnedin (1999; 2000a; 20006), Baryshnikov and Gnedin (2000), and Bruss and 
Delbaen (2001). The present investigation is particularly motivated by the work of 
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Bruss and Delbaen (2004) which estabhshes a central hmit theorem for the sequen- 
tial selection of a monotone subsequence when the number N of values offered to 
the decision maker is a Poisson random variable that is independent of the sequence 
Xi, X2, ... of independent random variables with common continuous distribution. 

Here, we consider the problem of making on-line selection of an alternating sub- 
sequence: 

X,^ > X,^ < Xi, > • • • ^ X^^ with 1 < ii < Z2 < • • • < Zfc < n. 

To be completely explicit, we consider the class 11 of Markov deterministic policies 
that are adapted to the sequence {Xi : 1 < i < n}, so the decision to accept or 
reject the value Xi that is offered at time i is a deterministic function of the vector 
{Xi,X2, ■ ■ ■ ,Xj). One might also consider randomized policies, but, standard re- 
sults in dynamic programming confirm that there is no profit in doing so here (cf. 
Bertsekas and Shreve, 1978, Corollary 8.5.1). 

Given a policy tt G 11, we denote by A'I^{-k) the number of selections made by 
TT for the realization {Xi,X2, • . • j^n}. It was found in Arlotto, Chen, Shepp and 
Steele (2011) that for each n there is a unique policy tt* e 11 such that 

EK«)] = supEK(7r)], 
Tren 

and it was also found there that the behavior of the mean is very well controlled; 
specifically one has 

(1) EK«)] = (2-^/2)n + 0(l). 

Here our main goal is to show that ^"(tt*) satisfies a central limit theorem. 

Theorem 1 (Central Limit Theorem for Optimal On-line Alternating Selection). 

There is a constant < < 00 such that 

^° «) - (2 - V2)n 2 

— A* (0,(7 J as n 00. 

The value of is not known exactly, but we give a formula that expresses 
CT^ as infinite series. Monte Carlo calculation^ suggest that cr^ ^ 0.3096, but 
the determination of an explicit (closed-form) expression for remains an open 
problem. It may even be a tractable one, though it is unlikely to be easy. 

Nature and Organization of the Analysis 

Our proof of Theorcm[l]rcsts on the sustained investigation of the value functions 
that are determined by the recursive Bellman equation for our problem. The first 
implication of this analysis is that the optimal policy tt* is a threshold policy, i.e. 
the policy is characterized completely by a set g-n-i, ■ ■ ■ ,gi} of time-dependent 
threshold functions that tell us when to accept or reject a newly presented value. 
An early step in our investigation was the numerical calculation of these threshold 
functions, and we would have had little chance to develop the argument given here 
without the detailed guidance that is summarized in Figure [T] 

Section [2] recalls those results from earlier work that are needed to make the 
present arguments self-contained. The main fact we need is the convenient form 
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([3]) of the Bellman equation, but we also make essential use of a technical property 
of the value functions that is given by Q , which we call restricted supermodularity. 

Sections [3] through [6] develop the geometry of the threshold functions. Here we 
have been as systematic as possible so that one might see the features of our analysis 
that might carry over to other Markov decision problems. Roughly speaking, one 
frames concrete hypotheses based on the suggestions of Figure [U and one proves 
these hypotheses by inductions that are assisted by two flavors of supermodularity. 
The specific inferences are particular to the problem of alternating selections, but 
the general pattern should apply to many Markov decision problems. 

Sections [7] and [8] exploit the geometrical characterization of the threshold func- 
tions to obtain information about the distribution of A° (tt*), the number of selec- 
tions made by the optimal policy for the problem with time horizon n. The main 
step here is the introduction of a horizon-independent policy tToo that is determined 
by the limit of the threshold functions that define tt*. It is relatively easy to check 
that the number of selections A° (tToo) made by this policy is a Markov additive func- 
tional of a stationary, uniformly ergodic, Markov chain. One can use off-the-shelf re- 
sults to confirm that the central limit theorem holds for yl° (tToo), provided that one 
shows that the variance of A° {n^ ) is not o(n) . We then complete the proof of Theo- 
remfllby showing that there is a coupling under which A° (tt*) and ^4° (tToo) are close 
in L^, specifically we show || A° (tt*) - A° (tToo) - E (tt*) - A° (tToo)] ||2= o(Vn). 



2. The Bellman Equation and the Optimal Policy 

Since the distribution F is continuous and since the problem is unchanged if we 
replace Xi by Ui = F^^{Xi), we can assume without loss of generality that the 
Xi's are uniformly distributed on [0, 1]. To make our analysis self-contained (and 
to avoid repetition), we need to recall a few facts from Arlotto, Chen, Shcpp and 
Steele (2011). 

First, there is a sequence {gn, g-n-i, ■ ■ ■ , gi} of functions gk '■ [0,1] — > [0,1] so 
that, if we set 10 = and define Yi recursively for 1 < i < n by 

r,_i if X, < 

9n-i+l 

' \l~X, if X, >g„_,+i(K,_i), 
then the number of selections made by the optimal on-line policy tt* is given by 

n 

(2) A°«) = 1 (^'^ ^ 5n-,+l(r._i)) . 

i=l 

We also need a few facts about the value functions Vk '■ [0, 1] that are 

defined for 1 < A; < n by the expected sum 



i{x,> 5„_,+i(y,_i)) I y„_fe = y 

.2— n— fc+l 



In words, Vkijj) is the expected number of selections yet to be made by the optimal 
policy TT* when the number of observations yet to be seen is fc, and the current 
state Yn^k equals y. In particular, we have w„(0) = E[A°(7r*)]. 

The optimality principle of dynamic programming tells us that the value func- 
tions vi^{-), 1 < k < cx), can be recursively determined. Specifically, if we set 
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vo{y) = for all y G [0, 1], then the functions Vk{-), 1 < A: < oo, satisfy the Bellman 
equation 

(3) Vk(y) = yvk-i{y) + I niax{wfc_i(2/), 1 + - x)} dx. 

•'y 

One implication of this recursion is that the value functions 1 < k < oo, are 

continuous and difFerentiable. 

The last fact we need from our earlier analysis is that the value functions have a 
certain restricted supermodularity property. Specifically, they satisfy the inequality 

(4) Vk{u)-Vk{l-y) < Vk+i{u)-Vk+i{l~y) for ah y e [0, 1/2] and u £ [y,l-y]. 

This property may seem technical, but one cannot do without it. It provides an 
essential link in several of our induction arguments. 

The proof of the central limit theorem for A° (tt*) requires a reasonably detailed 
understanding of both the threshold functions gk{'), 1 < k < oo, and the value 
functions Vk{-), 1 < k < oo. Principally, we need to prove all of the monotonicity 
and convergence properties that are suggested by Figure [ij and this requires a 
sustained analysis of the Bellman equation ([s]). 

Assuming the few facts reviewed above, the development given here can be read 
independently of our earlier work. Still, for the purpose of comparison, we should 
note that the notation used here simplifies our earlier notation in some significant 
ways. For example, we now take k to be number of observations yet to be seen, and 
this gives us the pleasing formulation ([3| of the Bellman equation. We also write 
gk{y) for the threshold function when there are k observations yet to be seen, and 
this replaces the earlier, more cumbersome, notation f^-k+i niv)- 



3. Geometry of the Value and Threshold Functions 

Figure [T] gives a highly suggestive picture of the individual threshold functions 
gk{'), and it foretells much of the story about how they behave as fc — > oo. An- 
alytical confirmation of these suggestions is the central challenge. The path to 
understanding the threshold functions goes through the value functions, and we 
begin by proving the very plausible fact that the value functions are strictly de- 
creasing. 

Lemma 2 (Strict Monotonicity of the Value Functions). For each \ < k < oo, the 
value function y Vk (y) defined by the Bellman recursion ([3| is strictly decreasing 
on [0,1]. 

Proof. The proof uses induction on the sequence of hypotheses: 

Hfe : Vk{y + e) < Vk{y) for all y £ [0, 1) and all e > such that y + e < 1. 
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Figure 1. The threshold functions gt, 1 < fc < 10, (sohd Hnes) 
and their hmit as fc — >■ oo (dashed hne) for y e [0,35/100]. The 
plot suggests most of the analytical properties that are needed for 
the proof of the central limit theorem. 




Since wi(y) — 1 ~ y, Hi is true. For fc > 2, we note by the Bellman recursion ([3| 
that we have 



Vkiy + e) - Vkiy) ^ (y + e)vk-iiy + e) + / max{wfc_i(?/ + e), 1 + Wfc_i(l - x)} da; 

-yvk^i{y)- / max{vk^i{y),l + Vk-i{l- x)}dx 
< {y + e)vk-i{y + e) + / ina-x{vk-i{y),l + Vk-i{l - x)} dx 

Jy+e 

- {y + e)vk-i{y) - / inax{vk-i{y),l + Vk-i{l - x)} dx 

Jy+e 

= (y + e) {vk-i{y + e) - Vk^i{y)} < 0, 



where the first inequality of the chain follows from 



ny+c 

evk-iiy) < / ma,x{vk-iiy),l + vk^i{l- x)}dx 

•'V 



and the second inequality follows from Hfe_i. This completes the proof of H^, and 
of the lemma. □ 



An important benefit of the Bellman recursion ([3| is that it provides us with a 
variational characterization of the threshold functions gk{'), 1 < k < oo. Specifi- 
cally, we have the identity 

(5) gk{y) inf{a; G [y, 1] : Wfe-i(y) < 1 + Ufe-i(l - x)}, 

and, if y is such that Vk-i{y) < 1 + Vk-ii^ — y), then gk{y) is the value for which 
the decision maker is indifferent between selecting the current observation x (and 
changing the state of the system from y to 1 — x), or rejecting x (and leaving the 
state of the system, y, unchanged). By the strict monotonicity of Vk-i{-) we then 
see that gk{y) is uniquely determined for each y e [0, 1]. 

Figure [T] further suggests that the threshold functions have a long interval of 
fixed points; the next lemma partially confirms this. 

Lemma 3 (Range of Fixed Points). For all k > 1 and y E [0, 1] we have 

(6) vk{y) - Wfc(2/3) < vk{^) - Wfc(2/3) < 1. 
In particular, for all k > 1 we have 

(7) gk{y) = y for all y e [1/3,1] 
and 

(8) 5fe(y)<l/3 for all ye [0,1/3]. 

Proof. The first inequality of ^ is trivial since the map y h- !■ Vk{y) is strictly 
decreasing in y. Also, the identities (7) and (|8| are immediate from the variational 
characterization ^ and the bound (6 1. 

The real task is to prove the second inequality of Q . This time we use induction 
on the hypotheses given by 

(9) Hfc : Vk{0) ~ Vk{2/3) < 1, for 1 < fc < oo. 

As before wi(y) — 1 ~ y, so Hi is trivially true. Now, when we apply the Bellman 
recursion ^ with y — and y — 2/3 we get 

VkiO) - Vki2/3) = / ma,x{vk-iiO),l + Vk^i{l - u)} du 



Jo 

- {2/3)vk-i{2/3)- [ ma.x{vk-i{2/3),l + Vk-i{l-u)} du, 
J2/3 

from which a change of variables gives 

.1/3 .1 

(10) Vk{0) -Vk{2/3) ^ h{u)du+ l2{u)du 

Jq Jl/3 

where Ii{u) and /2(u) are defined by 

Ii{u) = max{i;fe_i(0), 1 + Vk-i{u)} - max{t>fe_i(2/3), 1 + Vk-iiu)} 

and 

l2{u) = max{i;fc_i(0) - Wfe_i(2/3), 1 + Vk-i{u) - Vk-i{2/3)} . 
For the first integrand, Ii{u), we note that 

(11) h{u) ^miix{vk-i{0) ~ vk-i{2/3),l + Vk-i{u) - Vk-i{2/3)} 

- max{0, 1 + Wfe-i(u) - Ufc_i(2/3)} . 



7 



The induction assumption Hfe_i then tells us that 

Wfc-i(0)"«fc_i(2/3) < 1, 
and the strict monotonicity of the value function on [0, 1] yields 

1 < 1 + Vk-i{u) - Ufe_i(2/3) for aU u e [0, 1/3]. 



Thus, both the first and the second addend in (11) equal the right maximand and 

(12) h{u)^0 for aU u e [0,1/3], 

so the first integral in ( |lO| ) vanishes. 

To estimate l2iu) note that Hk-i and monotonicity of y i—)- Vk-i{y) tell us 

• if M € [1/3,2/3], then 

hiu) = 1 + Vk-iiu) ~ Wfc-i(2/3) < 1 + i'fc_i(0) - Vk-ii2/3) < 2 and 

• Hue [2/3,1], then 

hiu) = max{ffc_i(0) - Vfe_i(2/3), 1 + v,,^i{u) - Vk-ii2/3)} < 1. 
Now we just calculate 

/■I /-a/s j-i 

Vk{0) -Vk{2/3) = / l2{u)du< I 2du+ Idw^l, 

Jl/3 Jl/3 J2/3 

and thus we complete the proof of (|6|. □ 

From Lemma [3] we know that a threshold function g^. has many fixed points; in 
particular, gkiy) = y if y G [1/3,1]. Figure [T] further suggests that much of the 
geometry of gk is governed by its minimal fixed point: 

(13) Cfc = inf{y : 5fc(y) = y}. 

The value also has a useful policy interpretation. If the value y of the last 
observation selected is bigger than ^k, then the decision maker follows a greedy 
policy; he accepts any feasible arriving observation. On the other hand, if y < , 
the decision maker acts conservatively; his choices are governed by the value of the 
threshold gkiy)- Finally, if y = ^fc, the greedy policy and the optimal policy agree. 
This interpretation of ^k is formalized in the next lemma, where we also prove that 
the sequence {^k ■ A: — 1,2,...} is non-decreasing. 

Lemma 4 (Characterization of the Minimal Fixed Point). For k > 3, the minimal 
fixed point ^k = inf{y : gkiy) — y} is the unique solution to the equation 

vk-iiy) ~ wfc-i(l - y) = 1. 
Moreover, the minimal fixed points form a non- decreasing sequence, so we have 

(14) a<a+i for all k>l. 
Proof. From the variational characterization of 5fe(-)i we have 

5fc(y) = inf{a: G [y, 1] : Wfc_i(y) < 1 + Ufc-i(l - x)}, 
so if we set Skiy) = Vk-iiy) ~ Vk-ii^ — y), then wc have 

(15) gk(y) = y if and only if 5kiy) < 1. 

The Bellman equation ^ for Vki-) and Lemma[2]tell us that the map y 1— ?► Vk-iiy) 
is continuous and strictly decreasing with fi(y) = 1 — y and W2(y) = (3/2)(l — y^). 
Then, the function 6k is continuous and strictly decreasing, and for fc > 3 we have 
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Sk{0) ~ i'fc-i(O) > V2{0) — 3/2 > 1, and (5^(1) — —Vk-i{0) < 0, so, there is a unique 
value y* such that 

Since the map y i— > 6k{y) is strictly decreasing, we can also write y* as 

y* = inf{y : Vk-i{y) - Vk-i{l - y) < 1} = ini{y : gk{y) ^ y] ^ 6, 



where the second equality follows from ( 15 ) and the third equality comes from the 
definition of ^fc. 

To prove the monotonicity property < ^fc+i for all fc > 1, we first note that 
since uo(j/) = and vi{y) = 1 — y, we have that Ci — ^2 — 0. Thus, for fc > 3, we 
find 



9k{v) ^ y} 

h{y) = vk-i{y) ~ vk-i{l - y) < 1} 
Sk+i{y) = vk{y) - vk{l - y) < 1} 
5fc+i(y) = y} = Cfc+i, 

where the one inequality ( [16^ follows from restricted supermodularity Q. □ 
4. A Second Supermodularity of the Bellman Recurrence 



a = inf{y e [0, 1/3] 

= inf{y e [0, 1/3] 

(16) < inf{y e [0, 1/3] 

= inf{y e [0, 1/3] 



The value functions have a second supermodularity property that provides some 
crucial help. Specifically, we need it to show that the threshold functions 
increase with 1 < k < oo. This monotonicity moves us a long way toward an 
exhaustive understanding of the asymptotic behavior of the threshold functions. 

Proposition 5 (Second Supermodularity). For all fc > 3, the value functions 
defined by the Bellman recursion ([3| satisfy the bound 

(17) Vk-i{y) - Wfc-i(l - x) < Vk{y) - Vkil - x) for all y < Cfe and x e [y,gfe(y)]- 

Proof. We again use induction to exploit the Bellman equation, and this time the 
sequence of hypotheses is given by 

Hfe : Vk-i{y) - Vk-i{l - x) < Vk{y) -Vk{l-x), for aU y < ^fc and x e [y,gfe(y)]. 

We first prove H3, which we then use as the base case for our induction. We 
recall that Wi(y) = 1 — y and, if we use the Bellman recursion ([3]), we obtain that 



^'2(y) = (3/2) (1 — y^). In turn, this implies 53 (y) — max{l — ^^2/3 + y^, y} and 
^3 = 1/6. To calculate v^{y) we apply the Bellman recursion one more time, and 
we obtain a messier but still tractable formula: 

^ f(3/2)(l - y2) + 3-3/2(2 + 3y2)3/2 if y < i/Q 

"'^^^ \(l/2)(l-y)(4 + 5y + 2y2) if y > 1/6. 

Thus, for y < ^3 = 1/6, we need to show 

^'2(y) - ^2(1 - a;) < W3(y) - ^3(1 - x) for all x e [y,.g3(y)], 

where 53 (y) = 1 — •\/2/3 + y^. From our explicit formulas for V2{ ) and v^^-) we 
have 

V3(l -x)- V2{1 -x)^ {h/2)x - ix^ + x^ 

and 

vM - V2{y) = 3-='/2(2 + 3y2)3/2 > (2/3)3/2 « 0.5443. 
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Calculus shows that {b/2)x — ix^ + x^ increases on < a; < 1 — ■\/2/3 and attains 
an endpoint maximum of (1/18) (9 — a/6) « 0.3640. Thus, wc find 

^^3(1 ~x)~ V2{1 - x) < (1/18) {%-VQ)< (2/3)'/' < v^{y) - v^iv) 

for all y < 1/6 and y < x < 1 — ■y/2/3 + j/^, completing the proof of H3. 

We now suppose that holds, and we seek to show Hk+i- First, from the 
variational characterization of gk{') and the definition of ^fc, recall that 

1 < Vk-i{y) - Wfc-i(l - x) for y < £,k and x e [y,gk{y)], 
which, together with the induction assumption Hfc, implies 

(18) I < Vk-i{y)-Vk-i{l-x) < Vk{y)-Vk{l-x) for y < and a; € [y, ^^(y)]. 

The second inequality in ( 18 1 and the variational characterization ([5]) give us 

gk{y) < 9k+i{y) foraUy<^fe. 

Moreover, if x G [gk{y),gk+i{y)\ the variational characterization of gk+i{') also 
gives 

Vk-i{y)-Vk-i{l-x) < 1 < Vk{y)-Vk[l-x) for y < Cfe and x G [gk{y) , gk+i{y)], 



which combines with ( 18 1 to give the crucial inequality 

(19) Wfc-i(y) - Wfc_i(l -x) < Vk{y) - Vk{l - x) for y < ^s, and x G [y,gk+i{y)]. 

From an application of the Bellman recursion ([3| for y < £,k and x G [y,5fc+i(j/)], 
we obtain 

Vk{y) ~ Vk{l ~x)^y (wfc_i(y) - Wfc-i(l - x)) 

(20) +/ ma.x {vk-i{y) - Vk-i{l ~ x),l + Vk~i{l - u) - Vk-i{l - x)} du. 

Jy 

If we now change variable in the last integral by replacing u with 1 — w, then the 
range of integration changes to [x, 1 — y] and we can rewrite (20) as 

Vk{y) - Vk{l -x) ^y (vk-i{y) ~ Vk-i{l - x)) 

max{ufc_i(y) - Ufc-i(l - x), I + Vk-i{u) - Vk-i[l - x)) du 

max{ufc_i(y) - Ufc-i(l - x),l + Vk-i{u) - Vk-i[l - x)} du. 

l-x 



In this last equation we see that we can use our crucial inequality ( 19 ) to bound 
the first addend and the left maximand of the other two addends. Moreover, since 
X < gk+i{y) < 1/3, we can appeal to the restricted supermodularity Q to bound 
the right maximand of the second addend. In doing so, we obtain 

(21) Vk{y) - Vk{l - x) < y {vk{y) - Vk{l - x)) 

l-x 

max{vk{y) - Vk{l - x),l + Vk{u) - Vk{l - x)} du 
max{wA;(y) - Vk{l - a;), 1 + Vk-i{u) - Vk-i{l - x)} du. 

l-x 
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We now observe that the monotonicity property of the map u i— > Vk-i{u) for u € 
[1 — x, 1 — y] and the variational characterization of gk+i{-) combine to give 

1 + Vk-i{u) - Vk-i{l -x) <1< Vk{y) - Vk{l " x) 



for all y < £,k and x G [y,gk+i{y)]- Hence, the third integrand in (21) satisfies the 
equality 

max{i;fe(y) - Vk(l - x), 1 + Wfe-i(u) - Vk-i{l - x)} = Vk{y) - Wfe(l - a;), 
and an analogous monotonicity argument for uG [1 — — y] also yields 
max{ufe(?;) - Vk{l - x), 1 + Vk{u) - Vk(l - x)} = Vk{y) - - x). 



When we use the last two observations in (21 1 we obtain that 



Vk[y) - Vk{l -x) < Vk+i{y) - Vk+i{l - x), for all y < £,k and x S [y,gk+i[y)]- 

We now conclude our argument by considering values y G [CfciCfe+i]- From the 
variational characterization of gk+i{') and the definition of ^k^ we obtain 

Vk-i{y)-Vk-i{l-x) < 1 < Vk{y)-Vk{l-x) for y e [Cfc,Cfc+i] and x e [y,gk+i{y)\ 



which can be used instead of ( |19[ ) to construct an argument similar to the earlier 
one and conclude that 

Vk{y) - Wfe(l -x) < Vk+iiy) - Wfe+i(l - x), for y e [Cfc,Cfe+i] and x e [y,gk+i{y)], 



just as needed to complete the proof of (17 1. □ 



The usefulness of second supermodularity property Proposition [5] shows itself 
simply — but clearly — in the following corollary. 

Corollary 6 (Monotonicity of Optimal Thresholds). For ally e [0, 1] the threshold 
functions satisfy 

(22) gk{y) < gk+i{y) for all k > and 

(23) 1/6 <.gfc(y) for all k> 3. 

Proof. For fc = 1, 2 we have fo(y) = and vi{y) = 1 — y, so that 

gi{y) = .92 (y) = y- 

For fc = 3, we have already noticed in the course of proving Proposition [5] that we 
have gsiy) = max{l — a/2/3 + y^ , y}, so, in particular, g^iy) > 1/6 for y g [0, 1]. 
Finally, for fc > 3, the bound ( [T7| and the variational characterization ^ of the 
threshold function give us (|22|), and this confirms the lower bound (p3|. □ 



We now pursue two further suggestions from Figure [T] Specifically, we show 
that the limit function g^o has exactly the piecewise linear shape that the figure 
suggests, and we also show that the convergence to goo is uniform. The proof of 
these fact requires some additional regularity properties that are discussed in the 
next section. 
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5. Regularity of the Value and Threshold Functions 

The minimal fixed points give us a powerful guide to the geometry of the value 
function and its derivatives. The connection begins with the Bellman recursion ([3| 
and the variational characterization ([5| which together give the identity 

vk{y) ^ 9k{y)vk-i{y) + j {i + vk-i{i~- x))dx. 

•'gkiv) 

If we now differentiate both sides with respect to y, we obtain the recursion for the 
first derivative: 

v'kiy) = g'kiyW-iiv) + 9k{y)v'k-i{y) - g'kiy) {i + «fe-i(i - 9k{y))} ■ 



The definition of the minimal fixed point ( 13 ) and the variational characterization 
([5]) then give us 

(24) Vk^i{y) = I + Vk-i{l ~ gk{y)) ify<6, 

so our recursion for wj,(-) can be written more informatively as 



(25) v'^{y) 



9k{y)vk-iiy) ify<Cfc 
^'fc-i(y) - 1 - Vk-iii -y) + y<_i(y) if y > Ct- 



These relations underscore the importance of the minimal fixed points to the ge- 
ometry of the value function, and they also lead to useful regularity properties. 

Lemma 7 (Monotonicity Properties of the Derivatives). For all k > 1, we have 

(26) -1 < 4(y) < v'k+M < for ye [0, 1/3] and 

(27) vUiiy)<Vkiy)<-^ for ye [1/2,1]. 

Proof. We already know from Lemma [2] that y 1— >■ Vk{y) is strictly decreasing, so 
v'i.{y) is non-positive on [0, 1]. Since < gk{y) < 1, the top line of (25) tells us that 

(28) 4-i(y)<3fe(y)<-i(y) = ^'fc(y) fory<efe. 



To cover the rest of the range in ( 26 1, we use induction on the sequence of hypotheses 

Hfc : w;_i(y) < v'kiy), for all y e 1/3] and 2 < fc < cx). 
For the base case H2 we have ^2 = 0, wi(y) = 1 — y, and i;2(y) = (3/2)(l — y^). So 

v[{y) = — 1 < — 3y = t'2(y) if ^^^^ ^^^Y if y — l/^i 
just as needed. Now taking as our induction assumption, we seek to prove 

First, for y e [^fe,l/3], the second Hne of (25) gives us v'j.{-). By restricted 
supermodularity ([4|, the monotonicity < ^k+i, and the induction assumption 
Hfe, we see for y e [£,k+i, 1/3] that 

'"kiy) = vk-i{y) - 1 - wfe-i(i - y) + yv'k-i{y) 

<vk{y)-l- vk{l - y) + ywfc(y) = v'^+iiy), 

completing the proof H^+i. To complete the proof of ([26|, one just needs to note 
that the lower bound — 1 < v[.(y) now follows from v'i{y) ~ —1 together with (28) 
and Hfc. 
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To prove (27 1, we again use induction, but this time the sequence of hypothesis 
is given by 



k ■ 



Vkiy) < ^k-iiy) for y e [1/2, 1], and 2<k< 



As before, vi{y) ~l — y and V2{y) — (3/2)(l — y^) so v[{y) = —1 and W2(y) 
For y > 1/2, we then have 

v'^{y)<~3/2<-l = v[{y), 



'iy- 



proving H2. As tradition demands, we again take as our induction assumption, 
and we seek to prove Hfc+i. 

Since y G [1/2,1] we have 1 — y < 1/2 < y, so the restricted supermodularity 
property ^ gives us 



(29) 



vk-i{l -y)- vk-i{y) < ufe(l - y) - Wfc(y). 



Next, recaU the identity of the bottom Hne of (25), but, as you do so, replace k 
by fc + 1. We can then directly apply (29) and to get 

v'k+iiy) = vk{y) - 1 - ufc(l - y) + yv'kiy) 

< ufc-i(y) - 1 - vk-i{l - y) + yv'k-i{y) = v'k{y). 



This inequality completes the proof of H^+i and confirms the lower bound of (27). 
For the upper bound of (27), v'f.{y) < — 1 on [1/2, 1], we just need to note that it 
follows from the fact v'i{y) = — 1 and the validity of for all fc > 1. □ 

The smoothness of the value functions converts easily into a very useful Lipschitz 
equi-continuity property of the threshold functions. 

Lemma 8 (Lipschitz Equi-Continuity of Threshold Functions). For all k > 1 we 

have 



(30) 



\9k{y) - 9k{z)\ < |y - for all y,z e [0, 1]. 



Proof. We first consider y G [0, 6]- this case, we have that identity (24) holds, 
by its differentiation, we obtain 



(31) 



g'kiy) 



^Li(2/)l 



^Li(l-5fc(2/))l 



<0 for all ye [0,6 



Moreover, since y e [0,6] we know that y < 1/3 so by ([s]) we have gkiy) < 1/3, 
and hence by (27) we obtain 1 < |wfc_i(l — gk{y))\- Consequently, ([sl]) gives us 



(32) \9k{y)\ < \v'k-i{y)\ for aU ye [0,6], 

and ([26]) implies |ffc(y)| < 1. Thus, at last, we have the uniform bound 

(33) \g'k{y)\<l for all ye [0,6], 



which confirms the inequality (30) for y,z £ [0,6]- Also, for y,z £ [6,1] we have 
that (30) trivially holds, so if we choose y < < z, the triangle inequality gives us 

\9k{y) - gk{z)\ < \gk{y) - 3^(6)1 + l5fc(Cfe) - 9k{z)\ < |y - 2I, 



confirming that (30) holds in general. 



□ 
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6. The Optimal Policy at Infinity 

The minimal fixed points ^fe, k — 1,2,..., are non-decreasing and bounded by 
1/3, so they have a limit 

(34) lim a/= e< 1/3. 

The threshold values gk{y), k — 1,2,... are also non-decreasing and bounded, so 
they have a pointwise limit goc{y)- The next lemma characterizes goo and gives a 
crucial bound on the uniform rate of convergence to goo 

Proposition 9 (Characterization of Limiting Threshold). For the limit threshold 
goo ; we have the formula 

5oo(y) = max{^,y} for all y £ [0,1]. 

Moreover, we have an exact measure of the uniform rate of convergence 

(35) max \gk{y) - goo{y)\ = ^ - for all k > 1. 

0<y<l 

Proof. We first fix m and y G [0,^m]- We then recall that y < Cm < 1/3 implies 
that gj{y) < 1/3 for all j > 1. Now, given k > m we can repeatedly apply the top 
line of (I25I to obtain 



(36) K{y)\ = \v'^^Ay)\ n 9M < 3"^'>:„-i(2/)| for y € [0,Uh 



and by (26) we have \v'„^_i{y)\ < 1 for all y G [0, 1/3], so (32| gives us more simply 

(37) max |5fc(y)l < 3""'' for all fc > m. 

0<y<im 

Now, for any y,z in [0,^™] we have \gk{y) — gk{z)\ < 3"^~''\y — z\ so, letting fc — 00, 
we obtain that goo is constant on [0,^™] for each m > 1. Since C™ f there is a 
constant c such that 5oo(y) = c for all y £ [0,^). 

As Figure [l] suggests, c = ^ and this is easy to confirm. Again we fix m, take 
fc > m, and note that by the triangle inequality and the Lipschitz bound (30) on 
gk we have 

|5oo(Cm) - Cfcl < |ffoo(6«) - 5fe(Cm)| + |5fc(Cm) " gk{Ck)\ 
< |ffoo(6n) - gk{^m) \ + - ^k\- 

When fc — ^ 00, gk{£,m) converges to (?oo(Cm) and £,k to ^ so we have 

|5oo(U)-el < ICm-el- 

Since gooi^m) = c does not depend on m and since | Cm — CI — >■ as m — >■ cxd, we 
see that 3oo(Cm) — C for all m > 1 and consequently 500(2/) = C for all y G [0,C]- 
Finally, for all m > 1 we also have gm{y) — y for each j/ G [C, 1], so the proof of the 
formula for goo is complete. 
To prove ( 35 ) , we first note 

U~gk{y) yG[0,Cfc], 
ffoo(y) - gfc(y) = < C - y ye[Cfc,C], 
lo yG[C,i]- 
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By (31 ), 5fc(y) is strictly decreasing on [0, ^fe], so the gap goo{y)~ gk{y) is maximized 
when y = Ck- This gap decreases linearly over the interval [^^j, ^] and equals at ^; 
consequently the maximal gap is exactly equal to ^ — ^k- D 

7. The Central Limit Theorem for A°(7roo) Is Easy 



We now recall that denotes the limit (34) of the minimal fixed points, and we 
define a selection policy tToo for all Xi , X2 , . . . by taking the (time independent) 
threshold function to be 

9oo (y) = max{^, y} = ^yy. 

If A°(7roo) counts the number of selections made by policy tToo up to and including 
time n, then we have the explicit formula 

n 

(38) A°(^oo)-5]i(x, >evr/_i), 

1=1 

where one sets Yq = 0, and one defines Y^' for i > 1 recursively by 

(39) y/ 



YU iix,<^yYU 

l-X, ifX, >evK/_i. 



Given the facts that have been accumulated, it turns out to be a reasonably easy 
task to prove a central limit theorem for A° (tToo)- One just needs to make the right 
connection to the known central limit theorems for Markov additive processes. 

To make this connection explicit, we first consider the bi-variate random se- 
quence {Zi — {Xi^Yl_i) : i — 1,2,3,...} and note that it may be viewed as a 
two-dimensional Markov chain on the state space S = [0, 1] x [0, 1 — Specifically, 
for any pair (a;, y) Cz S and any Borel set C C 5, we have the point-to-set transition 
kernel 



K{{x,y),C) 



[ [l{{u,l~x) € C}lix >^\/y) + l{{u,y) G C}l{x < C V y)] du. 
Jq 

Given this explicit formula, it is a straightforward (but admittedly a little tedious) 
to check that a stationary probability measure for the kernel K is given by the 
uniform distribution 7 on 5 = [0, 1] x [0, 1 — We will confirm shortly that 7 is 
also the unique stationary distribution. 

To more deeply understand the chain Zi, i — 1,2,..., we now consider the double 
chain {Zi, Zi), i = 1,2, . . where Zi = (x, y) is an arbitrary point of S and Zi has 
the uniform distribution on S. For i — 1,2,..., the chains {Zi = {Xi, y/_i)} and 
{Zi — {Xi,Yl_i)\ share the same independent uniform sequence Xi, i — 1,2, . . ., 
as their first coordinate, while their second coordinates and K'_n are both 



determined by the recursions (39). Typically these coordinates differ because of 
their differing initial values, but we will check that they do not differ for long. 

To make this precise, we set v = min{z > 1 : Xi > 1 — ^}, and we show that i' 
is a coupling time for {Z^, Zi) in the sense that 

Zi = Zi for all i > V. 



Since 1"/ and F/ both satisfy the recursion ( 39 1 , we have 



r/ < 1 - C and y/ < 1 - C for all i ^ 1,2, 
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SO by the definition of we must have 

niaxUvy;_i,evi;'_i}<x,. 



The recursion ( 39 ) then gives us 

Y^ = Yl^l~X, and ^ Z,. 

By the construction of the double process {Zi,Zi), if one has Zi{oj) = Ziiuj) for 
some i — then Zj{Lj) = Zj{Lj) for aU j > i{u)), so v is indeed a couphng time 
for (Zi.Zi). 

The couphng inequahty (see, e.g., Lindvah, 2002, p. 12) then teUs us that for all 
Borel sets C C S we have 

(40) II K'iix, y),C) ~ 7(C) ||tv< P(^ > ^) = (1 - 0\ 



where 7 is the uniform stationary distribution on S. The bound (401 has several 
useful implications. First, it implies that 7 is the unique stationary distribution 
for the chain with kernel K. It also implies (see, e.g., Meyn and Tweedie, 2009, 
Theorem 16.0.1) that the chain {Zi : i = 1,2,...} is uniformly ergodic; more 
specifically, it is a (/(-mixing chain with 

(/){£) < 2/ and p=l-C 



If we set z — {x,y) and /(z) — l{x > y V ^), then the representation (38) can 
also be written in terms the chain {Zi : i = 1, 2, . . .} as 

n 

1=1 

and this makes it explicit that yl°(7roo) is a Markov additive process. Our coupling 
and the uniform ergodicity of {Zi : i = 1,2, . . .} imply (see, e.g., Meyn and Tweedie, 
2009, Theorem 17.5.3 and Lemma 17.5.1) that there is a constant > such that 

(41) lim n-^ Var (A° (tToo)) = lim n'^ Var^ (Ki-^oo)) = c^^ 

n— J-oo n— >C30 

where the first variance refers to the chain started at Zi = {Xi,0) and the sec- 
ond variance refers to the chain started at Zi with the stationary distribution 7 
(i.e. the uniform distribution on S). The general theory also provides the series 
representation for the limit (liTl): 



(42) a2 = Var^[l(Xi>{evyo'})] 

00 

+ 2 ^ Cov^ [1 (Xi > {e V Fo'}) , 1 (X. > V y/_i })] , 

i=2 

where 7 again refers to the situation in which the chain starts with Zi having the 
stationary distribution. 

The general representations (41 1 and (42) give us the existence of a but they do 
not automatically entail > 0, so to prove a central limit theorem for A'^(tToo) 
with the classical normalization, one must independently establish that cr^ > 0. To 
show this, we first need an elementary lemma that provides a variance analog to 
the information processing inequality for entropy. 

Lemma 10 (Information Processing Lemma). // a random variable X has values 
in {1,2,. . .} and P{X = \)=p, then p{l - p) < Var(X). 
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Proof. Define a function / on the natural numbers N by setting /(I) = and 
f{k) = 1 for fc > 1. We then have \ f{x) — f{y)\ < \x — y\ for all x, y e N. If we take 
Y to be an independent copy of X, then we have 

2p{l -p)^ E[{f{X) - f{Y)f] < E[{X - Yf] ^ 2 Var(X). 

□ 

Now we can address the main lemma of this section. 

Lemma 11. The variance o/A°(7roo) satisfies the asymptotic lower bound 

Var (v4° (tToo)) — ^{n), as n ~> oo. 

Proof. We first set 1^0 = and then we define the stopping times 

z/t = inf{^ > j^t-i : X, > 1 - e}, t = 1, 2, . . . . 

We also set T = 'mi{t : vt > n}, and note that T is a stopping time with respect 
to the increasing sequence of tr-fields 

Gt = o-{vi,V2, ■ ■■ ,vt} for all t>l. 

Next, we set 

(43) f/t = ^ l{Xi>^\J for 1 < f < T and set 

i—ut-i + l 

J2 i(x,>evy/„i), 

z— n+l 

SO we have the representation 

T 

(44) AUno.)^A:^{TT^)-V = J2Ut- V. 

t=\ 

Here, the random variables \Jt^ t = 1,2, . . ., are independent and identically dis- 
tributed. We also have V < — ri and vx = inf{i > n : Xi > 1 — so the 
variance of V is bounded by a constant that depends only on ^. The existence of 
the limit (41) and the Cauchy-Schwarz inequality then give us 

(45) Var (A^^in^)) = Var {A°^in^)) + O(V^) as n ^ ex., 

so to prove the lemma it suffices to show Var (A°^(7roo)) = il{n). 
By the definition of i^t and Ut, t — 1,2,..., we have 

T 

A:^{7r^) ^Y.Ut 
t=l 

so, by the conditional variance formula, the independence of the Ut^s, and fact that 
T is Gt measurable, we have the bound 

T T T 

(46) Var(^C/t) >E[Var(^[/t|gT)] - E[ ^ Var (C/^ | ^t)] ■ 

t=i t=i t=i 

We now note from the definition ([43]) that Ut takes values in {1,2, i^t ~ ^t-i}- 
Thus, if p is the probability that no Xi is selected for i e {i^t-i + 1, • ■ • 7 ^'t — l}i 
then setting a — {1 — we have 

p^F{Ut = l\ Gt) = P (X, < e for ah i^t-i + I < i < i^t - 1\Gt) ^ a^'^^'-'-^ 
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Now, by applying Lemma [T0| to the conditional expectation, we have 
Var {Ut I Gt) > a"'-"'-'-'^ (l - a''*-'''-'~^) , 



so from ( 46 1 we have 

T T 

t=i t=i 

The summands are independent and identically distributed and T is a stopping time 
with respect to the increasing sequence of cr-fields Gt = cji^i, i>2, . . . , vt}, t > 1, so 
by Wald's identity we have 

T 

(47) Var ( ^ f/*) > E [T] E [a"'-^ (l - a"^-^)] . 

t=i 

For the stopping time T, we have the alternative representation 

ji-i 

r = 1 + ^ 1 (X, > 1 - , 

i=l 

SO we have E[T] = ^ n + 0(1). Since has the geometric distribution with success 



probability ^ we also have E [a'^i-i (l - a'^^"^)] > 0, so by ^ and ^ the proof 



of the lemma is complete. □ 

All of the pieces are now in place. By the central limit theorem for functions 
of uniformly ergodic Markov chains (Meyn and Tweedie, 2009, Theorem 17.5.3; or 
Jones, 2004, Corollary 5) we get our central limit theorem for ^"(tToo). 

Proposition 12 (Central Limit Theorem for A°^{Tr^)). As n —i' oo, we have the 
limit 

\/n 

where = E-y [1 (X\ > {■C V i^o'})]; 7 stationary distribution for the Markov 



chain {Zi : i = 1, 2, . . and is the constant defined by either the limits (41 ) 



the sum (42 ) 



By appealing to the known relation ^ that E[A°(7r*)] = (2 - ^/2)n + 0(1), 
one can show with a bit of calculation that here we have fj, = 2 — \/2. Since this 
identification is implicit in the calculations of the next section, there is no reason 
to belabor it here. 

8. A°(7r*) AND A°(7roo) ARE Close in 



Proposition 12 tells us that the easy sum ^"(tToo) obeys a central limit theorem, 
and now the task is to show that the harder sum A° (tt*) follows the same law. The 
essence is to show that, after centering, the random variables A° (tt,* ) and A'^{7t^) 
are close in in the sense that || ^"(tt*) - A'^^tToo) - E[A°(7r*) - A°(7roo)] ||2= 
o{\/n) as n — > oo. For technical convenience, we work with the random variable 

The essential estimate of our development is given by the next lemma. In one way 
or another, the proof of the lemma calls on all of the machinery that has been 
developed. 
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Lemma 13 (L^-Estimate) . There is a constant C such that for all n > 3 we have 

n 
k=3 

SO, in particular, we have the asymptotic estimate 

II A„ ||2= o{\/n) as n ^ oo. 



Proof. Wc first note that the threshold function lower bound (23) implies that 
li < 5/6 for all 1 < i < n — 2. Consequently, ii Xi > 5/6, then Xi is selected 
by both of the policies tt* and tToo- At such a time i, we have a kind of "renewal 
event," though we still have to be attentive to the non- homogeneity of the selection 
process driven by tt* . 

To formalize this notion, we set tq — and for m > 1 we define stopping times 

Tm = inf {i > Tm-i : Xi > 5/6} and r,'„ ^ min{r„, n - 2}; 

so Tm is the time at which the mth "renewal" is observed. For each 1 < j < n ~ 2, 
we then set 

7V(j)=^l(X,>5/6), 

i=l 

SO the time t^vq) is the time of the last renewal up to or equal to j, the time T7v(j)+i 
is the time of the first renewal strictly after j, and we have the inclusion 

Tjv(j) <j< tnu)+i- 
For 1 < j < n — 2, we then consider the martingale differences defined by 

d, = E - A°_2(Too)|J-,] - E - ^°_2(^oo)| , 

where J-q is the trivial a-field and Tj — a{Xi,X2, ■ ■ ■ , Xj} ioi 1 < j < n. Using 
the counting variables 

7/, EE 1 {X, > 5„_,+i 1 ) ) and r,: = 1 (X, > ^ V K/_ i ) , 
we have the tautology 

(48) d, = E[ E - '?^) I -^^l - E - I 

n-2 n-2 

+E[ (^.-^:)i-^.]-E[ E (^.-^oi-^.-i], 

and this becomes more interesting after one checks that the last two terms cancel. 

To confirm the cancelation, we first recall that for Tjv(j)+i < n — 2 the value 
^Tjv(j)+i ^ 5/6 is selected as member of the alternating subsequence under both 
policies TT* and ttoq, so we also have 

Any difference in the selections that are made by the policies tt* and tToo after time 
T7v(j)+i is measurable with respect to the cr-field 

Trivially, we have j < r7v(j)+i, so Tj is independent of 7^ , and the last two addends 



in (48) do cancel as claimed. 
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We can therefore write 



(49) dj=E[ J2 {vi-v'd\^j]-n E {m-v'i)\:^j-i] = Wj-ij.i{Wj), 



where Wj denotes the first summand and Ij-i is the projection onto L'^{Tj-\). 
Denoting the identity by /, we have that / — is an L"^ contraction, so 



(50) E[4]<E[Hf]=E[( ^ {ni-r^[))\ 

i=j 

and the remaining task is to estimate the last right-hand side. 

For 1 < j < n—2, we let i(j) denote time from j since the last renewal preceding 
j; in other words, L{j) is the age at time j. Analogously, we let M(j) denote the 
time from j until the time of the next renewal or until time n — 2; so M(j) is the 
residual life at time j with truncation at time n—2. We then have 

L{j) =j- tnu) and M{j) = t'j^^j)+i - j- 

Our interarrival times are geometric, so L{j) and M{j) are independent, and for 
p = 1/6 we have 



and 

P(M(j) = to) = 



\p{1-pY ifO<£<j 

[{i-pY i{e = j, 



p{l - p)™"^ ifl<TO<n-2-j 
(l-p)"-3-J ifm = n-2-j. 

Wc now introduce the disagreem,ent sot 

Dj[e,m] = {u;:3ie{j-i + l,... ,j, ...,j + m} : X,{uj) G K^-^+i,^]} ; 

this is precisely the set of w for which, if Yj_^ = Yj_^, then the policies tToo and tt* 
differ in at least one selection during the time interval {j — £+1, . . . ,j + m}, while 
on the complementary set Dj[£,m] the selections all agree. Thus, by the crudest 
possible bound, we have 

1 E (ry. - vi)\ < im + M{j))i {D,[L{j),Mm , 

and when we square both sides and rearrange, we obtain 



E irh-r,'^) <{L{j) + M{j))H{Dj[L{j),Mij)]) 



n—2—j j 

(51) = E J2^i + m)H{Dj[£,m])HW)=miM{j)=m). 

m=l e=o 

For each 1 < j < n — 2 we now set 

Rj[£, to] = {w : Xi{w) < 5/6 for all i e {j - £ + 1, . . . ,j + m}} , 
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SO, Rj [i, m] is the event that no renewal takes place in [j — ^ + 1 , J ] or in [j + 1 , j + m] . 
By the definition of L{j) and M{j) we then have 



l{L{j) - £) = 1 {R,[£, 0]) 1 > 5/6 or £ ^ j) , 



for < f < j, 



and 



l(M(j) = to) < 1 [0, TO - 1]) , for 1 < TO < n - 2 - j. 
Thus, if we define 1 {Rj[0, 0]) = 1, then we have the composite bound 
(52) HL{j) = e)l{M{j) = to) < 1 {Rj[e,m- 1]) 1 {Xj_e > 5/6 or ^ = j) , 



so by inserting (52 1 in (51 1 and recalling (50), we find 
(53) 

E [4] < ^ Y^{i + mfV.[l{Dj[i,m])l{Rj[l,m-V\)l{Xj^i, > 5/6 or £=j)]. 

m=l 1=0 



The expected value on the right-hand side of ( 53 1 accounts for the probability that 
policies TT* and tToo differ when one renewal has occurred at time j — and no 
renewal will occur until time j + to. For this to happen, we need at least one 
i € {j — ^ + 1, . . . , j + to} such that Xi G [Cn-i+ii^]- Since the X^'s are uniformly 
distributed on [0,1], the probability that Xi G [Cn-i+iiC] equals ^ — Cn-i+i a-nd, 
by the monotonicity of the minimal fixed points in Lemma [4j we have the upper 
bound ^ — ^„_i+i < ^ — £,n-{j+m)+i for alH G {j — £ + 1, . . . , j + to}. Then, we can 
estimate the right-hand side of ( 53 ) with Boole's inequality, and obtain that there 
is a constant C such that 

E[l(i?,[^,TO])l(i?j[£,TO-l])l(X,_, > 5/6 or £ = i)] < C(to - £) - 6.-0-+™)+!) • 

At this point, C = 6/5 would suffice, but subsequently C denotes a Hardy-style 
constant that may change from line to line. If we use this last bound in (53 1, we 
obtain 

^— 2— j j 

SO, if we change variable by applying the transformation r = j + to, we have 

n~2 j 
r=j+l 1=0 

If we now sum over 1 < J < ^ ^ 2, we obtain 

ri— 2 n — 2 n — 2 j 

j=l j=l r=j+l i=0 

SO if we interchange the first with the second sum and rearrange, we have 

n—2 r— 1 j 

E [Al] < C E(^ - ^n-r+l){ E E(^ + - - - pY^'-'-'}- 

r=2 j = l 1=0 

At this point, it is elementary to check that for all r the last double sum is bounded 
by the constant X^i^i u'^{^ —pY~^, and this completes the proof of our lemma. □ 



21 



9. Some Perspective 

We have pursued the proof of a specific central limit theorem, but our analysis 
may be viewed more generally as a case study for a substantial class of Markov 
decision problems (MDPs). Here, we took advantage of the existence of a policy 
TToo that could be viewed heuristically has the "optimal policy at infinity," and the 
temporal homogeneity of this policy then gave us access to the machinery of Markov 
additive processes. There are many MDPs that offer similar prospects. 

It took some specialized effort to relate the finite horizon policy tt* to the limiting 
policy, but the pattern used here offers some general guidance. In almost any MDP, 
the Bellman equation gives one good prospects for computing the value function, 
but here we benefitted most substantially from our understanding of the geometry 
of the threshold functions that determine the optimal policy. Our development 
of this understanding would have been stymied without the guidance provided by 
Figure [T] If one views our analysis as a case study, then one message is that given 
almost any MDP one would be wise to begin with the best numerical work that 
the problem allows. 

Finally, the Bellman equation guarantees a natural role for induction in the 
analysis of many MDPs, but a more nuanced observation that emerges here is that 
it can be especially profitable to be attentive to various manifestations of super- 
modularity (or submodularity) . Without the special supermodularity properties 
represented by Q and (171 most of our inductions could not have moved forward. 



One can anticipate some aspect of this experience will be present in the analysis of 
a wide range of MDPs. 
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